Efficient iterative policy optimization

نویسنده

  • Nicolas Le Roux
چکیده

We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Heuristic Optimization Algorithm for a Two - Echelon ( R , Q ) Inventory System

This paper presents a two-echelon non-repairable spare parts inventory system that consists of one warehouse and m identical retailers and implements the reorder point, order quantity (R, Q) inventory policy. We formulate the policy decision problem in order to minimize the total annual inventory investment subject to average annual ordering frequency and expected number of backorder constraint...

متن کامل

Scalable and Fair Admission Control for On-Chip Nanophotonic Crossbars

Advances in CMOS-compatible photonic elements have made it plausible to exploit nanophotonic communications to overcome the limitations of traditional NoCs. Amongst various proposed nanophotonic architectures, optical crossbars have been shown to provide high performance in terms of bandwidth and latency. In general, optical crossbars provide a vast volume of network resources that are shared a...

متن کامل

Energy-Efficient Cognitive Radio Sensor Networks: Parametric and Convex Transformations

Designing energy-efficient cognitive radio sensor networks is important to intelligently use battery energy and to maximize the sensor network life. In this paper, the problem of determining the power allocation that maximizes the energy-efficiency of cognitive radio-based wireless sensor networks is formed as a constrained optimization problem, where the objective function is the ratio of netw...

متن کامل

An efficient improvement of the Newton method for solving nonconvex optimization problems

‎Newton method is one of the most famous numerical methods among the line search‎ ‎methods to minimize functions. ‎It is well known that the search direction and step length play important roles ‎in this class of methods to solve optimization problems. ‎In this investigation‎, ‎a new modification of the Newton method to solve ‎unconstrained optimization problems is presented‎. ‎The significant ...

متن کامل

An Iterative Heuristic Optimization Model for Multi-Echelon (R, Q) Inventory Systems

Large multi-echelon inventory systems usually consist of hundreds of thousands of stock keep units (SKU). Calculating inventory policies for each product is a computational burden that necessitates the need for more efficient policy setting techniques that reduce computational time and increases managerial convenience. The main objective of our research is to investigate the effect of segmentat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.08967  شماره 

صفحات  -

تاریخ انتشار 2016